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The cellular origin of chronic lymphocytic leukemia (CLL) is still debated, although this 
information is critical to understanding its pathogenesis. Transcriptome analyses of CLL and 
the main normal B cell subsets from human blood and spleen revealed that immunoglobulin 
variable region (IgV) gene unmutated CLL derives from unmutated mature CD5'^ B cells and 
mutated CLL derives from a distinct, previously unrecognized CD5+CD27+ post-germinal 
center B cell subset. Stereotyped V gene rearrangements are enriched among CD5+ B cells, 
providing independent evidence for a CD5+ B cell derivation of CLL. Notably, these CD5+ 
B cell populations include oligoclonal expansions already found in young healthy adults, 
putatively representing an early phase in CLL development before the CLL precursor lesion 
monoclonal B cell lymphocytosis. Finally, we identified deregulated proteins, including EBF1 
and KLF transcription factors, that were not detected in previous comparisons of CLL and 
conventional B cells. 



Chronic lymphocytic leukemia (CLL) is the 
most frequent B cell leukemia in elderly patients 
(Zenz et al., 2010). Approximately half of the 
cases of CLL carry unmutated Ig variable region 
(IgV) genes (uCLL), and the remaining cases 
have somatically mutated IgV genes (mCLL; 
Damle et al., 1999; Hambhn et al, 1999). This 
distinction is of biological interest and chnical 
relevance because uCLL is more aggressive with 
a significantly shorter time to first treatment 
(Rassenti et al., 2008). The identification of the 
cellular origin of CLL is essential to elucidating 
the pathobiology of a tumor. Only then can the 
fuU natural history of the disease be revealed and 
the dysregulation of gene expression and cellular 
functions be appreciated (Kiippers et al., 1999). 
For CLL, the consistent expression of CD5 led 
to initial speculations that CLL might be a 
malignancy of CD5* B cells (Caligaris-Cappio 
et al., 1982; Cahgaris-Cappio, 1996), which, in 
mouse, represents a distinct B cell lineage (Bl 
B cells; Dorshkind and Montecino-Rodriguez, 
2007) . However, fiinctional similarities between 
CLL and splenic marginal zone (sMGZ) B cells 
led to a proposal that CLL might be derived 
from such B cells (Chiorazzi and Ferrarini, 201 1). 
Based on a study of specific IgV gene rearrange- 
ments, a derivation of uCLL from conventional 



naive B cells was proposed (Forconi et al., 2010). 
About 10 yr ago, detailed gene expression profil- 
ing (GEP) of CLL and normal human B cell sub- 
sets surprisingly indicated that mCLL and uCLL 
are similar to memory B cells, but not CD5^ 
B cells (Klein et al, 2001), indicating that both 
CLL subsets originate from antigen-experienced 
B cells (Klein et al. , 200 1 ; Rosenwald et al. , 200 1 ) . 
This is supported by the finding that ^30% of 
CLL cases show highly similar IgV genes, which 
have been grouped into >150 sets of stereo- 
typed receptors (Stamatopoulos et al., 2007; 
Murray et al., 2008). This strongly suggests that 
such CLL recognized the same antigens, and 
hence B cell receptor (BCR) specificity plays a 
role in CLL pathogenesis. 

However, regarding the previous GEP 
studies (Klein et al., 2001; Rosenwald et al., 
2001), there are several caveats. First, none of 
these studies included sMGZ B cells. Second, 
in the previous most comprehensive gene ex- 
pression study of CLL and normal B cells. 



® 2012 Seifert et al. This article is distributed under the terms of an Attrlbutlon- 
Noncommercial-Share Alike-No Mirror Sites license for the first six months after 
the publication date (see http://www.rupress.org/terms}. After six months it is 
available under a Creative Commons License (Attribution-Noncommercial-Share 
Alike 3.0 Unported license, as described at http;//creativecommons.org/licenses/ 
by-nc-sa/3.0/). 



The Rockefeller University Press $30.00 

J. Exp. Med. 2012 Vol. 209 No. 12 2183-2191 

www.jem.org/cgi/(ioi/10.1084/jem.201 20833 



2183 



JEM 



memory B cells were isolated as bulk CD27* B cells (Klein 
et al., 2001). However, approximately half of CD27* B cells 
are class-switched, and the remaining cells are mostly 
IgM+IgD+CD27+ B ceUs (Klem et al., 1998), and few are IgM- 
only B cells (IgD'"""^^). Importantly, the generation of 
IgM"''IgD"''CD27^ B cells in gemiinal center (GC) responses or 
alternative pathways is discussed (Klein et al., 1998; Kruetzmann 
et al., 2003; Seifert and Kiippers, 2009; Weill et al., 2009). 
Third, in the previous study including CDS"*" B cells, these 
were isolated from cord blood, in which practically all B cells 
are CD5^ (Klein et al., 2001). However, it was recently re- 
ported that a fraction of human peripheral blood (PB) B cells 
are transitional, but not mature B cells, and that these cells are 
CD5^ (Sims et al., 2005). Importantly, at birth the majority 
of CD5+ B ceUs are transitional B cells (Ha et al., 2008; 
Marie-Cardine et al, 2008; Sims et al, 2005). Hence, in the 
previous GEP study, mostly transitional B cells and not mature 
CD5^ B cells were compared with CLL. Because of these 
restrictions, we perfonned a new GEP study of CLL in com- 
parison to normal naive, sMGZ, mature CD5''' and class- 
switched cells, as well as IgM* memory B cells. 

Additionally, we performed an IgV gene analysis from CD5^ 
and CD5^ B cells, to search for the normal B cell subset in 
which CLL-typical stereotyped BCR can be found. Both inde- 
pendent studies revealed that mCLL and uCLL cells are most 
closely related to mature 005"** B cells. Thus, we conclude that 
CLL is a maHgnancy of CD5^ B cells. Moreover, we identified a 



small subpopulation of CD5^ B cells expressing CD27 and car- 
rying somatically mutated IgV genes. These putative post-GC 
B cells may represent the physiological counterpart of niCLL. 

RESULTS 

Human naive and CD5+ B cells show a gene expression 
pattern highly similar to CLL 

For a comprehensive analysis of differential gene expression be- 
tween CLL and normal human B cells, we isolated PB naive 
B cells, memory B cell subsets (class-switched, IgM^IgD^CD27^, 
and IgM-only B cells), CD5"'" B cells (excluding transitional 
B cells), and sMGZ B cells. Global RNA expression from five 
to seven samples each was analyzed using Affymetrix HGU133 
2.0 Plus arrays. 

Between 500 and 3,000 transcripts selected according to 
highest SD were chosen for hierarchical clustering analyses. 
Fig. 1 shows a representative dendrogram of the 46 samples, 
based on 2,000 transcripts (Table SI, U133 HC). With the 
exception of IgM+IgD+CD27+ and IgM-only B cells (both 
referred to as IgM memory B cells), each B cell subset forms 
a separate branch, supporting the identification of consistent, 
subset-specific expression patterns. 

A separation of the 46 profiles into two major branches is 
evident. The first major group includes naive and CD5^ B cells 
and CLL, the second group consists of the three memory B cell 
subsets, as well as sMGZ B cells. Thus, the normal B cells ar- 
range according to their IgV gene mutation status and relation 
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Figure 1. Hierarchical clustering of nor- 
mal human 6 cell subsets and CLL. The 

dendrogram is based on a Spearman ranking 
of 2000 transcripts witin higlnest SD (Table Si) 
Subcluster stability was confirmed by boot- 
strapping procedure (>70'yo).The color bar 
depicts normalized intensity values. CD5+, 
CD5+CD27-CD38'™ B cells; naive, conven- 
tional naive B cells; mCLL, IgV mutated CLL; 
uCLL, IgV unmutated CLL; class-switched, 
lgG+CD27+ and lgA+CD27+ B cells; lgM+lgD+ 
memory, lglVI+lgD+CD27+ and IgM-only 
B cells; sMGZ, splenic marginal zone B cells. 
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to GC experience (the majority of CD5* B cells carry unmu- 
tated V genes; Brezinschek et al., 1997; Fischer et al., 1997; 
Dono et al., 2007), although the latter is controversially dis- 
cussed for sMGZ B cells (WeUer et al., 2004). Importantly, 
above a dominant gene expression signature shared by the CLL 
subsets, both cluster in the branch of naive and CD5^ B cells, 
indicating more similar gene expression to these cells than to 
PB memory B cells or sMGZ B cells. 

Principal component analysis (PCA) reveals 
a strong similarity between CLL and CD5* 
but not conventional B cell subsets 

In addition to agglomerative hierarchical clustering, we cal- 
culated the similarity of transcriptional profiles using PCA. 
In Fig. 2 a, the distance of individual samples is depicted ac- 
cording to the first and second principal components, ex- 
plaining 35.2% of the total variance. CD5"'" B cells and CLL 
share a high degree of similarity in their respective expression 
patterns (Fig. 2 a). 

To dissect the relationship of CLL to nomial B cell subsets 
in more detail, we performed PCA based on gene lists de- 
rived from pairwise comparisons (Fig. 2, b— f). Relying on the 
indication of a high similarity of CLL to memory B cells 
(Klein et al., 2001), we tested whether CLL is more similar to 
either class-switched or IgM^IgD*CD27^ memory B cells. 
In this PCA, CLL turned out to be considerably more similar 
to the IgM memory B cells (Fig. 2 b). However, in line with 
the hierarchical clustering analysis (Fig. 1), Fig. 2 c suggests a 
higher similarity of CLL and naive, but not IgM memory 
B cells. This contradicts a previous pubUcatioii (Klein et al., 
2001), although it should be mentioned that this study in- 
cluded tonsillar B cells. 

Importantly, when CD5^ B cells were compared with naive, 
or any of the CD27^ B cell subsets, mCLL and uCLL were 
significantly more similar to CD5^ B cells than to the other 
subsets (Fig. 2 d-f and unpublished data) . Notably, this high 
similarity is already reflected at the level of simple pairwise 
comparisons, where the amount of differentially expressed 
genes is minimal between CLL and CD5^ B cells (Table S2). 
Thus, among the five major B cell subsets included in the 
analysis, mature CDS'*' B cells show the highest concordance 
in gene expression to CLL, indicating a derivation of CLL 
from these cells but not from memory or sMGZ B cells. 

Genes differentially expressed between CLL 

and other B cell lymphomas reveal a CLL signature 

predominantly in normal CD5'^ B cells 

So far, we identified specific gene signatures of normal B cell 
subsets to reveal their similarity to CLL gene expression 
patterns. In a complementary approach, we identified "CLL- 
specific" genes by comparing transcription profiles of CLL to 
other B cell lymphomas, and then analyzed which normal 
B cell subset is most similar to CLL for these genes. We gen- 
erated a list of 27 annotated transcripts that are differentially 
expressed between CLL and two other major types of ma- 
ture B cell lymphomas, i.e., follicular and diffuse large B cell 
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Figure 2. PCA of CD5+ and CD5~ conventional B cell subsets and 

CLL (a) Unsupervised PCA shows a high similarity of CLL to CD5+ B cells, 
but not conventional B cells. The PCA is based on 10,395 annotated tran- 
scripts, explaining >35°/o of total variance. Axis scaling according to mean 
centering and scaling. Samples belonging to distinct subsets are depicted 
in the same color, (b-f) Supervised PCA showing mathematical distances 
of mCLL (red) and uCLL (blue) according to the first principal component 
of pairwise compared normal B cell subsets (annotated transcripts, >two- 
fold change; P < 0.05; FDR < 0.05). Subset samples are depicted on top or 
bottom. Black bars represent maximum distances and baseline according 
to mean centering and scaling, (g) Supervised PCA of CLL and selected 
normal B cell samples. Shown are mathematical distances of CLL cases 
and CD5+ and conventional B cell subsets according to 27 annotated 
transcripts differentially expressed between CLL and follicular lymphoma 
or diffuse large B cell lymphoma. Axis scaling according to mean center- 
ing and scaling. 
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lymphomas, including the genes CTLA4, CD200, and BCL2 
(Rosenwald et al., 2001). The PCA displayed CLL in closer 
proximity to CD5^ B cells than to conventional naive, IgM 
memory, and sMGZ B lymphocytes (Fig. 2 g). Similar results 
were obtained by applying a list of 19 transcription factors 
with specific expression in CLL, but not in diffuse large B 
cell lymphoma, hairy cell leukemia, mantle cell lymphoma or 
follicular lymphoma (Andreasson et al., 2010; unpublished 
data). This further argues for a close relationship of normal 
CDS'*" B cells and CLL, as CLL-specific genes include typical 
CD5^ B cell genes. 

CD5+ PB B cells are clonally expanded 

and include a small post-GC B cell population 

The indication that CD5^ B cells are CLL precursors involves 
an important caveat: most, if not all, CD5^ B cells are re- 
garded as pre-GC lymphocytes with unniutated IgV genes 
(Brezinschek et al., 1997; Fischer et al., 1997), but approxi- 
mately half of CLL harbor mutated IgV genes. Hence, the 
question arises whether a distinct subset of CD5"'' B cells with 
mutated IgV genes exists, that may be the specific precursor 
of mCLL. Indeed, there is a small fraction of CD5^ B cells 
(4—17%) that coexpress the memory B cell marker CD27 
(Klein et al., 1998; Fig. 3). Moreover, a small subpopulation 
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Figure 3. Flow cytometric analysis of CD5* human PB B cells of a 
healthy donor. Depicted are FACS plots of CD19-enriched human lym- 
phocytes stained for CDS, CD27, IgG, and IgA expression, (top left) CD19+ 
B cells contain a small fraction of CD5+CD27+ B cells. CD5f''9i'CD27f''9i' 
events represent residual T cells, as verified by CD3 staining (not de- 
picted), and therefore were excluded by gating procedure, (top right and 
bottom left] CD5+CD27+ B cells contain minor populations of class- 
switched lgG+ or lgA+ B cells. These stainings are specific, as there are 
no lgA+lgG+ double-positive B cells detectable (bottom right). Data are 
representative of 10 healthy donors. 



of these cells (between 0.5 and 2% of CD5+ PB B cells) is 
class-switched to IgG or IgA (Fig. 3). An average fraction of 
18.5% (7.5-42%) of CD5+CD27+ ceUs expressed CD43 
(unpublished data), which has been proposed as a marker for 
Bl B cells (Griffm et al, 2011). 

To clarify whether CD27 expression marks somatically 
mutated CD5* B cells, we analyzed VhI and Vh3 family 
gene rearrangements from CD5^ B cell subsets of four 
healthy donors. As expected for a pre-GC B cell popula- 
tion, CD5^CD27^ B lymphocytes were consistently unniu- 
tated (Table 1 and Table S3). Importantly, between 76 and 
95% of Vh genes from CD5+CD27+IgM+ and 72 and 100% 
from CD5''^CD27* class-switched B cells were mutated with 
average mutation frequencies of 2.7 and 2.2%, respectively. 
That is in the range of 2— 4%, typical for conventional IgM 
memory B ceUs (Klein et al., 1997; Klein et al., 1998). Thus, we 
have identified a distinct subset of IgV gene mutated human 
PB CD5^ B cells, characterized by expression of CD27. 

Surprisingly, the Vh sequences of the CD5''" B cell analysis 
revealed a substantial fraction of clonally related sequences 
within CD5+CD27", CD5+CD27+IgM+, and CD5+CD27+ 
class-switched B cells, where 78%, 5 1 %, and 92% of the sequences 
were assigned to 31, 20, and 15 independent clones, respectively 
(Table 1 and Table S3). For several reasons, most of these related 
sequences represent expanded B cell clones. First, clonally related 
sequences were for 20—48% of the clones detectable within in- 
dependendy processed dupHcate ahquots of 10,000 cells from 
the same CD5^ B cell population of a given donor (Table S3). 
Second, expanded clones were never detected in CD5^ B cells, 
which were processed in parallel to the CD5^ B cells (Table 1). 
Third, the sporadically observed intraclonal diversity among 
sequences derived fi-oni CD5^CD27^ B cells is only compatible 
with a derivation from distinct cells (Table S3). 

Whether somatic hypermutation takes place only in GC 
B cells is the subject of ongoing discussion (Kruetzmann et al., 
2003; Seifert and KuppeK, 2009; WeiU et al., 2009). We sought 
to validate that mutated CD5"'^ B cells are post-GC B cells. 
BCL6 mutations are a genetic trait of B cells mutating in the 
GC and are found in 20—30% of conventional post-GC 
memory B cells (Pasqualucci et al., 1998; Seifert and Kiippers, 
2009). Importantly, somatic hypermutation is strictly depen- 
dent on target gene transcription (Fukita et al., 1998; Bachl 
et al., 2001; Yang et al., 2006). Thus, Bf/6 mutations can only 
occur in B cells when these cells acquire mutations in a GC 
reaction, i.e., when Bd6 is strongly transcribed (Klein and 
Dalla-Favera, 2008). There is a low level of Bel 6 transcription 
in conventional naive B cells (Ye et al., 1993), but this is 
reduced when the cells undergo immune responses outside of 
the GC (AUman et al., 1996; Marshall et al., 2011). Thus, 
Bd6 cannot be targeted by somatic hypermutation in extra- 
foUicular responses. Bcl6 is transcribed at higher levels for a 
short time in pre— B cells (Nahar et al., 2011). However, these 
pre— B cells lack substantial expression of AID (Sitte et al., 
2012), and although low-level AID expression has been as- 
signed to another (immature) precursor B cell population, 
the AID expression level is considerably lower than in GC 



2186 



The cellular origin of CLL | Seifert et al. 



Article 



Table 1. VHl and VH3 gene analysis of human PB CD5+ and conventional B cell subsets 



Subset' 


Sequences 


% mutated 


Average 


Sequences assigned to 


Number of clones 






(range] 


mutation frequency'' 


clones 




CD5+CD27- 


148 


5 (0-14) 


0.04 


115 


31 


CD5+lgM+CD27+ 


145 


87 (76-95) 


2.7 


74 


20 


CD5+lgG/lgA+ 


126 


82 (72-100) 


2.17 


116 


15 


CD5- unmutated 


73 


0 


0 


0 


0 


CD5" mutated 


52 


100 


2.63 


0 


0 



"For CDS* and conventional subsets total numbers of four and two healthy donors are given, respectively. 

•■For calculation of the average mutation frequency, identical sequences were counted once, as they might derive from one cell. If all sequences were considered, very similar 
values were obtained (Table S3). In case of intraclonal diversity, each unique sequence was regarded as derived from an independent cell and counted once. Both in-frame and 
out-of-frame rearrangements were considered. 

'^CD5", conventional B cells were isolated and analyzed as CD19+ B cells and, afterward, separated into mutated and unmutated sequences. 



B cells (Meyers et al., 2011) and the effective mutation load 
of the corresponding Ig genes is negligible (Kuraoka et al., 
2011). Hence, an extra-GC derivation of mutations in Bcl6, 
the master regulator of the GC B cell differentiation program 
(Klein and DaUa-Favera, 2008), can be neglected, and the 
detection of such mutations is a strong argument for a GC 
experience of the respective B cells. 

BCL6 mutations were indeed detected in CD5*CD27^ 
B cell subsets in a frequency similar to that found in conven- 
tional memory B cells (Table S4). Collectively, we identified 
CD5+CD27+ B cells as a distinct subset of PB CD5+ B cells 




with somatically mutated V genes, which are GC experi- 
enced and partly class-switched. Remarkably, we observed 
clonal expansion among each of the CD5* B cell subsets. 

Derivation of uCLL from CD5+CD27- and mCLL 
from CD5+lgl\/l*CD27+ B cells 

Because CD5^ B cells consist of pre- and post-GC lympho- 
cytes, we aimed to clarify whether uCLL and mCLL derive 
from these normal B cell populations, respectively. We per- 
formed an additional GEP analysis of CD5*IgM"'^CD27^ and 
CD5+CD27- B cells, depleted for transitional and CD43+ 
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Figure 4. Unsupervised hierarchical clustering and supervised PCA of normal mature PB CD5+ B cell subsets and CLL samples, (a) Highly puri- 
fied CD5+CD27-CD38I™ naive B cells (yellow), CD5+CD27+CD38-CD43- memory B cells (green), and mCLL (red) and uCLL (blue) were included in a 
HuGene-l_0-st-vl GEP study. Hierarchical clustering is based on 500 transcripts with the highest SD, according to Manhattan clustering and average 
linkage method. This clustering is representative for dendrograms based on 250-3,500 transcripts, (b) The PCA is based on 104 genes (twofold change; 
P < 0.05; FDR< 0.05) differentially expressed between CD5+CD27- and CD5+CD27+ B lymphocytes. mCLL (red) and uCLL (blue) are displayed along the first 
principal component, covering >59'¥o of total variance, (c) PCA with 107 CLL cases from an independent exon expression study (Table S6). Only probe sets 
scored with "best match" in a HuGene-HuExon array comparison (Affymetrix) were considered, resulting in 79 genes present on both platforms. The 
abscissa depicts the Eigenvector values of the similarity matrix associated with the dataset according to mean centering (mean zero) and scaling (to unit SD). 
The distribution of the data points along the ordinate was chosen arbitrarily to display all data points separately. 
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B cells, from five healthy donors, and 5 mCLL and 4 uCLL 
(a fifth case was excluded because of technical failure) on 
HuGene-l_0-st-vl arrays. Hierarchical clusterings based on 
500—5,000 genes with highest SD showed that normal 
CD5^CD27^ and CD5*CD27* samples were stably arranged 
in one and CLL cases in another branch (Fig. 4 a and Table S5). 
Besides one uCLL arranging with mCLL, all cell types clustered 
separately. Thus, the distance tree mainly reflected differ- 
ences between normal and CLL cells, as expected from previ- 
ous publications (Klein et al., 2001; Rosenwald et al., 2001). 

However, a PCA based on 104 genes significantly separat- 
ing CD5+CD27" from CD5+CD27+ B lymphocytes revealed 
that mCLL tend to be more similar to CD5*CD27^ cells and 
uCLL more similar to CD5+CD27- B cells (Fig. 4 b). As 
expected, the transcriptionally very similar CLL samples were 
not clearly separated (see Discussion). Therefore, we sought to 
ascertain if the weak association of the CLL subtypes to either 
mutated or unmutated CD5* B cells was statistically signifi- 
cant. A larger cohort of 107 CLL was subjected to PCA ac- 
cording to the same algorithm covering 41.9% of the total 
variance (Table S6). Importantly, although mCLL and uCLL 
samples were overlapping, the distributions of the two CLL 
subsets were significantly different (P < 0.002), and numerous 
uCLL and mCLL were positioned closer to CD5*CD27^ and 
CD5+CD27+ B cells, respectively (Fig. 4 c). 

CoUectively CD5+CD27- and CD5+CD27+ B ceUs show 
consistent differences in their gene expression pattern, and uCLL 
tend to be more similar to IgV unmutated CD5''' B cells, whereas 
mCLL show a higher similarity to post-GC CD5^ B cells. 

CD5+CD27- and CD5+CD27+ B cells, but not CD43+ B cells, 
preferentially express stereotyped BCR 

Up to 30% of CLL express stereotyped BCR. These are 
defined by highly homologous heavy and light chain com- 
plementarity-determining region 3 (CDR3) amino acid 



sequences (>60% homology), as well as confined V-gene usage, 
and either presence or absence of somatic mutations (Messmer 
et al., 2004; Stamatopoulos et al., 2007). If mature CD5+ 
B cells are precursors of CLL, preferential expression of such 
stereotyped BCR in these cells can be postulated. We identi- 
fied 12 stereotyped IgV genes among 160 unique sequences 
derived from CD5* B cells (7.4%) and a single one among 107 
sequences derived from conventional B cells (<1%) of four 
healthy donors (Table 2 and Table S7). This difference is sta- 
tistically significant (P < 0.018). 

To further validate these findings we designed a PCR 
specific for rearrangements using IGHV segments frequently 
used in stereotyped BCR and analyzed two additional indi- 
viduals (Table S8). We included CD43^ B cells, as they were 
recently proposed as human BIB cells with phenotypic simi- 
larity to CLL (Griffin et al., 2011). Again, the frequency of 
stereotyped rearrangements was significantly higher among 
CD5* B cells as compared with CD5^ B cells (18/145 versus 
6/141, respectively; P < 0.018; Table 2 and Table S8). Impor- 
tandy, among the stereotyped receptors from both approaches, 
eight of nine VHl-69 rearrangements were unmutated and 
four of seven VH3-21, four of four VH3-23, and five of seven 
VH4-34 rearrangements were mutated (unpublished data). 
This correlation of Vh gene usage and mutational status among 
stereotyped BCR is in Une with published data (Murray et al., 
2008). Furthermore, 19 out of the 30 stereotyped IGHV re- 
arrangements detected among mature CD5^ B cells could be 
assigned to 6 of the 10 main CLL categories (Stamatopoulos 
et al., 2007; Table S7). However, three stereotyped IGHV rear- 
rangements from class-switched CD5* B cells did not belong 
to typically class-switched CLL stereotypes (unpublished data) . 

The CD43"'^ B cells fi"equently showed identical rearrange- 
ments, indicating that these cells rarely use the 7GHK segments 
tested, and/or encompass expanded B cell clones. None of 
the 72 IGHV region genes from CD43^ B cells included in 
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Stereotyped BCR in human PB conventional, CD5 


+, and CD43+ [ 


3 cells 






Donors 
analyzed 


Sample 


V genes 
obtained 


V genes 
included in 
analysis^ 


Unique 
sequences'" 


Stereotyped 
receptors'" 


Fisher's exact 
test= 


4" 


conventional 


143 


125 


107 


1 




4d 


CD5+ 


460 


419 


160 


12 


P< 0.018 


2= 


conventional 


196 


169 


141 


6 




2= 


CD5+ 


180 


158 


145 


18 


P < 0.018 


2"^ 


CD43+ 


115 


72 


12 


0 


P< 1 



^Only full-length V gene rearrangements and correct V-gene, i.e., \/Hl-69, \/H3-21, VH3-23, VH3-48, and VH4-34 in stereotyped BCR-specific PCR approach 

•■Identical sequences counted as one. When all sequences obtained are counted individually, the statistical significance is below P < 0.001. Stereotypy was determined based 

on a 60% homology of the CDR3. 

■^P-values calculated by Fisher's exact test versus unique conventional B cell sequences. 

''Total numbers of V^^l and \/„2 gene analysis of four independent healthy blood donors are given, for details see Table S3. 

'Total numbers of V genes amplified by PCR for gene segments frequently used by stereotyped BCR of two independent healthy blood donors are given, for details 
see Table S7. 
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the analysis (representing 12 different rearrangements) was 
stereotypic (Table 2 and Table S8). 

We conclude that stereotyped rearrangements are 

predominantly found in mature CD5* B cells at a frequency 
similar to and reflecting the IQHV gene mutation pattern in 
CLL. Moreover, CD43''" B cells do not fulfill the criteria to 
be the specific precursors of CLL. 

Similarities and differences between CD5+ PB B cell 
subsets and conventional B cells 

Having identified two subsets of mature CDS'** PB B cells 
based on CD27 expression, we wondered how these relate to 
conventional CD27^ naive and CD27^ memory B cells in 
their gene expression patterns. We generated a list of 519 
transcripts that were difl^erentially expressed with at least a 
twofold change (P < 0.05 and FDR< 0.05) between conven- 
tional naive and IgM memory B cells, including typical naive 
(ABCBl, IL4R, and FCER2) and memory B cell genes 
(CD27, ITGB2, CD80, CD95, and CD32). A heatmap of 
the expression pattern of these genes for the two CD5^ B cell 
subsets revealed that CD5*CD27^ B cells resembled conven- 
tional naive B cells and CD5*CD27* B cells conventional 
IgM memory B cells (Fig. 5 a and Table S9). This flirther argues 
for a pre-GC differentiation stage of CD5*CD27" B cells 
and a post-GC differentiation stage of CD5*CD27* B cells. 
However, regarding the overlap of genes differentially expressed 
between conventional naive and memory B cells (Table S9) 
and those differentially expressed between CD5"*'CD27^ and 
CD5+CD27+ B ceUs (1.5 fold-change; P < 0.05; false discov- 
ery rate < 0.05; Table S9), there was an overlap of only 
'^18% of these genes, although the majority of the remaining 
transcripts showed similar expression tendencies (i.e., up- or 
down-regulated in CD27^ versus CD27"*' B cells; unpublished 
data). This indicates specific differences between conventional 
and CD5^ B cells, in line with their separation in the un- 
supervised hierarchical clustering (Fig. 4 a). To further analyze 
these differences, we performed gene set enrichment analysis 
(GSEA) of CD5^CD27^ and conventional naive B cells. Our 
main focus was on activated B cell signatures because CD5 is 
also considered as a B cell activation marker (Beiiand and 
Wortis, 2002) and we wanted to clarify whether CD5* B cells 
in human PB may simply represent activated conventional 
B cells. Importantly, immediate early and delayed early acti- 
vation genes were consistently down-regulated in CD5^ 
versus conventional naive, IgM memory, or MGZ B cells (Fig. 6, 
g and h). Moreover, CD5^ B cells did not show signs of 
increased NF-kB activity or CD40 signaling (Fig. 6 i,Table SIO, 
and not depicted). This finding is in line with a previous pubUca- 
tion showing that CD5* B cells do not express typical B cell 
activation markers (Damle et al., 2002). Besides lacking an early 
activation or increased NF-kB target gene signature (Table SIO), 
CD5^ B cells exhibited reduced cytokine, interleukin, TNF, 
STAT3, and MAPK signaHng according to GSEA when com- 
pared with conventional naive B cells (unpublished data). 

Finally, according to GSEA, CD5^ B cells showed signs 
of homeostatic prohferation (Fig. 6 j). As the majority of 



CD5^ B cells is resting, with >99% of the cells in GO/Gl phase 
(unpublished data), this could mean that a small fraction of 
CD5^ B cells proliferates. In line with this view, we detected 
an increased expression of MYC target genes in CD5* B cells 
(Fig. 6 k), especially in the CD27* subset (unpublished data). 
Fluorescence microscopic analysis of CD5"'" B cells revealed 
that a small fraction of CD5^ B cells (<1%) does show nu- 
clear MYC expression (Fig. 6 a). 

Collectively, CD5"'" B cells do not display an activation 
signature in comparison to conventional naive B cells, but 
there is a signature for preferential homeostatic proliferation 
in CD5''" B cells. Determining a fuU picture of the biological 
properties of 005"*^ B cells wiU require additional detailed stud- 
ies of mature conventional and CD5^ B cell subsets, which is 
beyond the scope of this study. 

Evaluation of genes with similar expression 
in CD5+ B cells and CLL 

We wondered whether there are genes that were previously 
regarded as deregulated in CLL in comparison to conven- 
tional B cells, but that do not show deregulation in compari- 
son of CLL to CD5* B cells, which we now consider as their 
normal counterpart. A list of selected genes with previously 
proposed functional relevance for CLL is shown in Fig. 5 b 
and Table SI 1. 

FOXPl, a transcription factor involved in lymphocyte 
development and with a potential role of an oncogene in lym- 
phomas (Koon et al., 2007), is increased in CLL in compari- 
son with conventional B cells (Korac et al., 2009). Similarly, 
the transcription factor LEFl, which has important functions 
in lymphopoiesis, promotes CLL survival and is not expressed 
in conventional mature B cells (Gutierrez et al., 2010; Tandon 
et al., 2011). Both transcription factors are transcribed in 
005"*^ B cells at a level similar to that found in CLL cells, sug- 
gesting they already have functions in normal CD5"'^ B cells 
and remain active after transformation into CLL. In line with 
this view, GSEA shows that LEFl target genes are up-regulated 
in CD5^ versus conventional naive B cells (unpublished data). 

The tyrosine kinase LCK is involved in S phase transi- 
tion and apoptosis (Patersoii et al., 2006) and tetraspanin 
family member CD9 modulates adhesion and migration of 
the neoplastic B cells (Barrena et al., 2005). These genes 
were previously reported to be up-regulated in CLL. How- 
ever, they are expressed at similar levels in normal CD5^ 
B cells, indicating that their expression pattern reflects unique 
signaling potential and migration properties of the CD5^ 
precursor population. 

IL-24 is expressed in CLL and contributes to tumor sur- 
vival (Sainz-Perez et al., 2006). Normal CD5^ B cells show 
similarly high transcript levels of IL-24 (Fig. 5 b). Interest- 
ingly, the role of IL-24 in normal B cells is to inhibit the 
plasma cell differentiation program in GC B cells (Maarof 
et al., 2010), thus IL-24 may also contribute to a restrained 
differentiation potential in CLL. 

Members of the Rho family subclass of small GTPases 
showed similar expression levels in CD5''* B cells and CLL, 
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Figure 5. Evaluation of CD5+ B cell and CLL gene expression profiles, (a) Heatmap of CD5+ B cell subset transcription patterns of genes with dif- 
ferential expression between conventional naive and lgM+ memory B cells (Table S9).The color bar depicts normalized intensity values, (b) Heatmap of 
selected genes with similar transcription in CD5+ B cells and CLL (Table S1 1). (c) Immunoblotting of EBF1. Protein lysates of CLL and CD19+ B cells from 
healthy donors were analyzed for EBFl and GAPDH content. Data are representative of 10 CLL analyzed, (d) Expression heatmap of selected KLF family 
members in CLL and CD5+ B cells. GEP of normal CD5+ B cell subsets and CLL were filtered for KLF family members with differential expression within the 
four cell types (Table Si 2). Depicted are normalized signal intensities, (e) Differentially expressed transcripts (>twofold-change; P < 0.05; FDR < 0.05) 
between uCLLand CD5+CD27" B cells (HG U133 array), but not (<twofold change; P < 0.05; FDR < 0.05) between uCLLand bulk conventional B cells 
(naive and memory B cells combined). All transcripts in this list (Table S14) are considered as deregulated genes in CLL that were not detectable (or under- 
estimated) in analyses including conventional CD19+ B cells. Normalized signal intensities are shown. 
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but not conventional B cells. RHOA, RHOB, and RHOC 
proteins promote reorganization of the actin cytoskeleton 
and regulate cell shape, attachment, and motility (Bustelo 
et al., 2007). r/ioB-deficient animals indicate tumor sup- 
pressing properties of this small GTPase, whereas rhoC- 
deficient tumor cells showed reduced migration and lower 
invasiveness, thus indicating a beneficial role for RHOC in 
tumor development (Bustelo et al., 2007). Remarkably, 
low expression of RHOB, but high levels of RHOA and 
RHOC are already detectable in normal CD5^ B cells, sug- 
gesting that the CLL tumor cells may profit from taking 
over this expression pattern. In line with this result, GSEA 
revealed that shared characteristics of CDS"*" B cells and 
CLL, for example, reduced G protein— coupled receptor 
activity and interaction with extracellular matrix receptors 
(unpublished data) in contrast to conventional B cells. 

Several genes show similarly low transcription levels in 
CLL and CD5^ but not conventional B cell subsets (Fig. 5 b 
and Table SI 1). FCRL4 was previously reported to be down- 
regulated in CLL (Kazemi et al., 2008), with the potential to 
dampen BCR signaling and enhance TLR signaling (Sohn 
et al., 2011). Similarly, RORA, a potent negative regulator 
of NF-kB signaling (Delerive et al., 2001) and DUSPl, 
a negative regulator of MAPK signaling, are already down- 
regulated in mature CD5^ B cells, contributing to the idea of 
unique signaling characteristics in CD5^ B cells that are (at 
least partially) inherited in CLL. 

Finally, TCLIA and ZAP70, two molecules characteristi- 
cally expressed in CLL, preferentially in uCLL cases with 
poor clinical outcome, are already transcribed at similarly 
high levels not only in conventional naive B cells, but also in 
mature CD5+ B cells (Fig. 5 b and Table Sll). 

Evaluation of deregulated genes and implications 
for pathogenetic mechanisms in CLL 

As we propose mature CD5''" B cells as the cells of origin of 
CLL, we screened the respective transcription patterns for 
deregulated genes in CLL that were not identified in previ- 
ous studies, including memory or bulk B cells. 215 annotated 
transcripts with consistent differential expression (on both array 
platforms) between CD5"'" B cell subsets and CLL included 
72 genes (30%), known to be deregulated in CLL, as well as 
143 (70%) to the best of our knowledge so far unreported 
genes (unpublished data). From the 72 previously published 
genes, 71 showed an expression tendency that was in line 
with published data. Thus, the GEPs of CLL and CD5^ B cells 
are of high quality and neatly reproduce a large number of 
well described CLL features. 

Among newly identified deregulated genes, the B cell Hn- 
eage transcription factor EBFl was significandy down-regulated 
in CLL compared with CD5^ or conventional B cells. This ob- 
servation was supported by increased expression of EBFl- 
induced genes in normal B cells, but not in CLL (Fig. 6, 1). 
EBFl-repressed genes were not decreased at statistically 
significant levels. Whereas EBFl was detectable in conventional 
B cells by immunoblotting, it was below the detection limit 



in mCLL and uCLL (Fig. 5 c). The low expression of EBFl 
may lead to reduced levels of numerous B cell signaUng factors, 
thereby contributing to an anergic signature of CLL cases 
(Mockridge et al., 2007; Muzio et al., 2008) and low suscep- 
tibility to host immune recognition (Schultze et al., 1996). 

Similarly, gene sets that were up-regulated in plasma cells 
instead of B cells (Fig. 6 m) are significantly enriched in CDS'*" 
B cells. Moreover, the flow cytometric analysis of intracellular 
Ig verifies a low content of cytoplasmic Ig in CLL (unpub- 
lished data). However, the reduced potential to differentiate 
into antibody-secreting cells may also be more directly medi- 
ated by down-regulation of plasma cell factors. BACH2, a 
transcriptional repressor inhibiting plasma cell differentiation, 
was expressed in CLL and genes containing a predicted BACH2- 
binding site in their promoter region were significantly de- 
creased in CLL (unpublished data). 

Members of the KLF family of transcription factors are 
considered to have tumor-suppressive properties. We observed 
a down-regulation of several KLF factors in CLL when com- 
pared with CDS"*" B cells (Fig. 5 d). Although the fold-change of 
down-regulation of some of these KLF factors is low (Table S12), 
the decrease of these transcriptional regulators is consistent 
and may therefore add up to an important impact on CLL 
biology. Indeed, KLF2-induced genes (Haaland et al., 2005) 
were significantly increased in 005"*" B cells versus CLL (Fig. 6 
n). This includes genes like CDKN2D, an inhibitor of CDK4, 
which in turn has been proposed to keep CLL cells blocked 
in the Gl cell cycle stage and contribute to tumor accumula- 
tion (Wolowiec et al., 2001). Moreover, the CLL growth in- 
hibiting TGFBl (DeCoteau et al., 1997) and caspases 8 and 
10, both important regulators of programmed cell death in 
CLL (Enjuanes et al., 2008), are KLF2 targets, found to be 
down-regulated in CLL. KLF2-repressed genes were not de- 
creased at statistically significant levels. Immunofluorescence 
analysis of KLF2 and KLF3 confirmed that nuclear expression 
of both factors was consistently detectable in a fraction of 
normal CD5* B cells but virtually absent in CLL (Fig. 6 
and Table SI 3). Although the transcription level in CD5^ 
B cells was only marginally higher than in CLL (1.5-fold- 
change), KLF3 expression was more pronounced on protein 
level, with bright nuclear stainings in 28 to 40% of 005"*" 
B cells, whereas KLF2 was detected in ^^1% of CD5* 
B cells analyzed (Fig. 6, c— £). 

In addition, we filtered for genes with differential expression 
between CLL and CD5^ but not conventional B cells (Fig. 5 e 
and Table S14), i.e., deregulated genes in CLL that were not de- 
tectable (or underestimated) in analyses, including only conven- 
tional B cells for comparison to CLL. A substantial number of 
these transcripts included typical B cell genes like CD20 
(MS4A1), CD21 (CR2), CD40, CD79B, and IGHD, affirming 
the idea of a decreased B cell phenotype in CLL, mediated by 
EBFl down-regulation (Fig. 5 c and unpublished data). 

SIPAl, a RAPl GTPase-activating protein is significandy 
down-regulated in CLL compared with CD5^, but not 
conventional B cells. This is remarkable, as SIPAl -deficient 
mice niostiy develop myeloproliferative disorders, but in few 
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Figure 6. Fluorescence microscopic analysis and GSEA of CD5+ B cells and CLL. CD5+ B cells and CLL cells were stained for intracellular MYC, 
KLF2, or KLF3 expression (green). DNA and actin were stained with Hoechst 33258 (blue) and Phalloidin-TRITC (red), respectively. MYC was expressed in 
<1°/o of CD5+ B cells (a), as compared with isotype negative control stainings (b). Sporadic KLF2 expression was detectable in CD5+ B cells (c) but never in 
CLL (d). KLF3 was expressed in 28-40°/o of CD5+ B cells (e) and in 1-5% of CLL cells (f) as determined by two independent blind studies of two normal 
CD5+ B cell samples and two CLL each (Table SI 3). Pictures are representative of four healthy donors and five CLL analyzed, (g-n) shows selected plots 
from a GSEA based on 24,000 probe sets of 5 CD5+CD27- and 5 CD5+CD527+ B cell GEP combined with 7 conventional naive B cell GEP or 9 CLL GEP, 
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instances accumulate self-reactive CD5* B cells in the perito- 
neal cavity (Ishida et al., 2006). 

Several genes involved in G protein signaling were found 
to be deregulated by a factor of more than twofold, including 
GNG7, GNGll, RAPIGDSI, RASGRP2, RASSF2,VAV1, 
VAV3, and RAB31.To what extent these findings contribute 
to CLL pathobiology remains unresolved. However, it is obvi- 
ous that a comparison of CLL to its mature CD5* B cell pre- 
cursor gives rise to several previously unreported aspects. 

Similarly, we identified by GSEA several gene sets with 
preferential deregulation between CLL and CDS"*" B cells: 
FOXOl was described to be expressed in CLL at levels simi- 
lar to that of conventional B cells (Xie et al., 2012). However, 
in comparison to mature CD5* B cells, FOXOl target genes 
are significantly enriched in CLL (unpublished data). Also en- 
riched in CLL are ABC family transporters, which are respon- 
sible for membrane permeability for a broad range of nutrients 
and compounds and genes with a PITX2 -binding site in the 
promoter region (unpublished data). Gene sets significantly 
enriched in CDS"*" B cells (and thus down-regulated upon 
transition into CLL) are, for example, genes with POU6F1- 
binding site in the promoter region, a known proliferation- 
driving homeobox transcription factor in adenocarcinoma, 
and HIFl- and SRF-driven genes (unpublished data). 

DISCUSSION 

We aimed to determine the cellular origin of CLL by com- 
paring the global gene expression of mCLL and uCLL to the 
major human mature B cell subsets. Unsupervised multipara- 
metric analyses of >10,000 transcripts, pointed to CD5^ B cells 
as the normal B cell subset with the most similar gene expres- 
sion to CLL. This finding was confirmed by refined assays 
based on normal B cell subset-specific expression patterns and 
notably, was already reflected by pairwise comparisons on 
single gene level. In a complementary approach, the presence 
of genes that are differentially expressed between CLL and 
other B cell lymphomas identified gene expression patterns 
typical for CLL in CD5^ B cells. Thus, we conclude from 
our transcriptional characterization that mature CD5^ B cells 
are in all probability the cell of origin of this leukemia. This 
refers to the cell that was the direct precursor of the tumor 
clone. As tumorigenesis is a multi-step process, first trans- 
forming events may have happened in earlier differentiation 
stages, perhaps even in hematopoietic stem cells of CLL pa- 
tients (Kikushige et al., 2011). 

An unexpected finding was that not only uCLL but also 
mCLL was highly similar to CD5* B cells, which are mostly 
IgV unmutated (Breziiischek et al., 1997; Fischer et al., 1997; 
Dono et al., 2007). We specified a distinct subset of 
CD5+CD27+ B cells (^1% of total B ceUs), with the vast 



majority of these cells carrying somatically mutated IGHV 
genes. Addressing the current discussion on somatic hyper- 
mutation outside the GC (Kruetzmann et al., 2003; Weill 
et al., 2009), we showed that CD5+CD27+ B cells carry 
mutations in BCL6 as a specific hallmark of B cells under- 
going hypermutation in the GC (because only in GC B cells 
BCL6 is highly transcribed and strong transcription of a gene 
is essential for somatic hypermutation; Pasqualucci et al., 
1998; Seifert and Kiippers, 2009). Thus, we identified a dis- 
tinct subset of somatically mutated post-GC CD5* memory 
B cells. The high similarity of the two CD5^ B cell subsets 
in terms of their gene expression supports the idea that 
CD5+CD27+ B cells derive from CD5+CD27- B ceUs that in 
rare instances can undergo GC reactions. 

A separate GEP analysis revealed that mCLL tend to be 
more similar to CD5^CD27^ B cells, and uCLL tend to be 
more similar to CD5*CD27^ B cells. This was also seen when 
we investigated the CD5^ B cell gene signatures of >100 
CLL from an independent cohort. As consistent gene ex- 
pression differences discriminating mCLL from uCLL are 
scarce, and CLL subsets are more similar to each other than 
to any other normal B cell subset, the potential of CD5* 
B cell transcription patterns to translate such minor differ- 
ences into significant CLL subtype predictions is the more 
impressive. Thus, it appears that uCLL derive from CD5'^CD27^ 
B cells, whereas mCLL derive from CD5^CD27^ B cells. 
This interpretation is supported by the shared pattern of 
IGHV gene mutations — uCLL and unmutated CD5^CD27^ 
B cells versus mCLL and mutated 005*0027"** B cells — and the 
distribution of mutated and unmutated stereotyped IGHV 
rearrangements in the two C05* B cell subsets, respectively. 
Moreover, a derivation of niCLL from post-GC COS"*" 
memory B cells is further supported by the fact that not only 
normal 0O5''"CO27"'' B cells, but also mOLL carry BCL6 
mutations as a genetic trait of a GC passage (Pasqualucci et al. , 
2000; Jantus-Lewintre et al., 2009). The potential relation- 
ship between a rare, C05-expressing B cell population from 
human tonsils, partially located in GOs, and CLL, has been 
previously described (Cahgaris-Oappio et al., 1982). Notably, 
the expression of 0O27 on uOLL is not contradictory to a 
derivation from 005*0027^ B cells, as C027 is also up- 
regulated upon T cell-independent B cell activation (Huggins 
et al., 2007), and expressed on a variety of leukemias and 
lymphomas (van Oers et al., 1993), independent of their cel- 
lular origin (Oong et al., 2002). 

In two separate IGHV gene POR analyses, stereotyped 
BOR were significantly enriched in C05* B cells as compared 
with conventional B cells. This independently supports the 
conclusion from the transcriptome studies that C05* B cells 
are the precursor population of CLL. This conclusion is not 



irrespective of the mutation status. Normalized enriclnment score (NES), nominal p-value (p), and FDR (q) are given for each plot (g-k) CD5+CD27" B cells 
are shown on the left side (red) of the plots, conventional naive B cells on the right side (blue), (g) Immediate early genes, (h) delayed early genes, (i) NF-kB 
target genes, (j) homeostatic proliferation genes, (k) MYC/MAX target genes, (l-n) pools of CD5+ B cell subsets are shown on the left side (red), CLL on the 
right side (blue). (I) EBF1 target genes, (m) genes up-regulated in plasma cells versus B cells, (n) KLF2-induced genes. 
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contradicted by another study of stereotyped VHl-69 re- 
arrangements among umiiutated V genes (Forconi et al., 2010), 
because that study did not separate B cell subsets and CD5^ 
B cells account for ^15% of unniutated B cells in adult PB. 
Our second I QHV gene analysis included CD43^ B cells, as 
these putative Bl cells were recently proposed to be CLL 
precursors (GrifFm et al., 2011). About two third of CD43^ 
B cells are CD5*, but unexpectedly, they are also defined as 
CD27+ B ceUs (Griffin et al., 2011). As the IGHV gene PGR 
with VH primers for IGHV segments frequently used by 
stereotyped BGR did not amplify many different rearrange- 
ments from the isolated 0043^0027"*" B cells, this popula- 
tion is either oligoclonal or it does not frequently use these 
IGHV genes of dominant stereotyped groups, or both. Nev- 
ertheless, no stereotyped rearrangements were found among 
0043"*" B cells. Although we do not exclude that in some in- 
stances CD43*CD5^ B cells can transform into CLL, these 
cells do not represent the specific CLL precursors. It should 
also be noted that in our analyses, CD43''"CD27+ B cells are 
much rarer (on average 3% of PB B cells from 7 donors) than 
reported by Griffm et al. (2011; 13% of PB B cells). 

The transcriptome of human mature CD 5^ B cells is 
clearly distinct from conventional naive and memory B cells. 
Moreover, the observed clonality and the increased frequency 
of stereotyped receptors in CD5^ B cell subsets is not detect- 
able in CD5^ B cells. This autonomy of human CD5^ B cells 
is supported by independent studies showing marginal but 
distinct phenotypic features of these cells, e.g., bias in V gene 
repertoire (Brezinschek et al., 1997) or polyreactivity to nu- 
clear and cytoplasmic antigens (Herve et al., 2005). Further- 
more, normal CDS'*" B cells lack a clear activation or NF-kB 
signature, arguing against the idea that these cells simply rep- 
resent activated conventional B cells. Hence, CDS expression 
defines a distinct differentiation stage, or perhaps even a sepa- 
rate lineage of human B cells. 

The derivation of CLL ffom CDS^ B cells has major im- 
plications for our understanding of its pathogenesis. First, 
CLL often express polyreactive and autoreactive BCR speci- 
ficities (Catera et al., 2008; Chu et al, 2008; Chu et al., 2010), 
and it is an important question whether this holds true for 
normal mature CDS"'" B cells as well. Initial studies suggest 
that CDS^ B cells may possibly be reactive only to selected 
autoantigens (Herve et al., 200S). As the autoreactivity of an- 
tibodies from CLL is partly linked to stereotyped BCR, the 
finding of stereotyped IGHV gene rearrangements in normal 
CDS"*" B cells supports the idea that some CDS^ B cells ex- 
press autoreactive receptors and through chronic antigenic 
stimulation might be at risk to undergo malignant transfor- 
mation. In this regard, the relatively frequent occurrence of 
IgV mutated CLL (ca. S0% of cases) as opposed to the rarity 
of mutated normal GDS^ B cells might indicate that in those 
rare instances when an (autoreactive?) CDS^ B cell is driven 
into a GC reaction, it is at increased risk for malignant trans- 
formation. This may be due to the extensive clonal expansion 
and perhaps the mutagenic environment in the GC (Kiippers 
et al., 1999). Furthermore, the fact that mCLL on average 

2194 



have a higher mutation load than normal CDS^CD27"'" 
B cells indicates that a prolonged GC experience (and hence 
higher mutation load) of these cells leads to a higher propen- 
sity of malignant transformation. Second, there is convincing 
evidence that CLL may be preceded by monoclonal B cell 
lymphocytosis, which is defined by expansions of mostly 
CDS* B cells, and found in ^3% of healthy elderly individu- 
als. Interestingly, these clones often carry already some of the 
genetic lesions typical for CLL and hence can be regarded as 
premalignant conditions (Landgren et al., 2009). Although 
the present analysis was not designed to clarify the clonal 
composition of human mature CDS* B cells, it revealed a 
surprising oligoclonality of these cells. Therefore, we extend 
the chain of events in CLL development by identifying oligo- 
clonaUy expanded normal CDS* B cells, already detectable in 
young healthy adults, as potential precursors of monoclo- 
nal B cell lymphocytosis in the elderly. Finally, we provide 
evidence that comparison of CLL to mature CDS* B cells 
identifies deregulated genes with increased sensitivity. Indeed, 
we found numerous novel expression patterns that were not 
recognized in previous studies when CLL were compared 
with bulk or memory B cells, e.g., diminished EBFl expres- 
sion and reduced levels of tumor suppressor genes of the KLF 
family in CLL. Moreover, we provide a selection of genes with 
important functions in lymphocytes, deregulated in CLL, 
which might explain numerous pathophysiological aspects of 
this disease. These include signaling properties, migration po- 
tential and metabolic features. 

On the other hand, we identified genes which appeared 
differentially expressed between CLL and conventional B cells 
but turned out to be similarly expressed by normal CDS* B cells 
and CLL, so that they were not deregulated during cellular 
transformation (Fig. S e and Table SI 4). Thus, we reveal a 
highly similar expression pattern of the normal and malignant 
cells, which is characteristic for CLL. This implicates that the 
tumor inherited or adopted normal B cell properties. CLL 
pathobiology may be better understood, if the role of these 
genes in normal CDS* B cells is clarified. 

Importantly, comparing CLL cells to their specific cell of 
origin is also of relevance when deregulated miRNA expres- 
sion in CLL is studied, or epigenetic changes during the trans- 
formation process in CLL are being evaluated. We conclude 
that the identification of two distinct CDS* B cell subsets as 
cellular origin of CLL significantly contributes to a better 
understanding of CLL pathobiology. 

MATERIALS AND METHODS 

CLL samples and healthy blood donors. PB samples of CLL patients and 
healthy donors and splenic tissue were analyzed with the approval of the eth- 
ical review committee of the University of Duisburg-Essen and with in^ 
formed consent of the donors according to our institutional guidelines. At 
the time of sample collection, patients were either untreated or treatment- 
free for >3 mo. Clinical and laboratory data are shown in Table S15. All CLL 
expressed IgM and IgD. Normal PB B cell subsets for GEP were taken from 
healthy donors {age, 23—59 yr; mean, 35 yr), samples for other experiments 
were prepared from buffy coats (age, 18—66 yr). Splenic tissue was derived 
from surgery to repair traumatic rupture or tumor surgery, but without direct 
tumor affection (donor age, 49—76 yr; mean, 59 yr). 
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Flow cytometry and cell sorting. B and CLL cells were isolated by Ficoll 
density gradient centrifugation (GE Healthcare) and CD19-MACS strategy 
(Miltenyi Biotec) to a purity of >98%. Samples were FACS-sorted and ana- 
lyzed by staining with anti-CD5-APC (UCHT2; BioLegend), CD43-FITC 
(IGIO), CD38-APC CHIT2), CD27-FITC (MT271), CD23-PE (M-L233), 
CD24-PE (MLS), IgG-FITC (G18-145), IgA-FITC (F0316; Dako), CD21- 
PE (1048), IgM-FITC (G20-127), IgD-PE (IA6-2), IgD-PE-Cy7 (IA6-2), 
and CD27-APC (MT271). If not stated otherwise, all antibodies were pur- 
chased from BD Biosciences (BD). Samples for HG U133 2.0 plus arrays 
were sorted as: CD5+CD27-CD23+CD24- ("CD5+"), CD23+lgD^"shcD27- 
("naive"), IgM+IgD^CD27+, lgM+IgD'°^^-CD27+ ("IgM-only"), IgG+/ 
IgA+CD27+ ("class-switched"), IgM+CD2lhi8^CD27+ ("sMGZ"). B cells for 
HuGene-l_0-st-vl arrays, BCL6 and IGHV analysis were negatively en- 
riched by the EasySep Human B Cell Enrichment kit 19054 (STEMCELL), 
thus excluding CD43''" B cells. CDS"*" B cells were subsequently enriched by 
anti-CD5-APC (UCHT2) and anti-APC MicroBeads (Miltenyi Biotec). 
Samples were sorted as CD5+lgD-'CD38i°™CD27- (CD5+CD27-) and 
CD5-'CD38-CD27'- (CD5+ memory) B cells. CDS'- class-switched B cells 
were sorted as CD5"'CD38-CD27+ and lgG+ or IgA+. Transitional B cells 
(CD38^^s^CD24''^s^) were excluded. For the IgV gene PGR, CD43+ Bl 
B cells (CD43+CD27'-) and B2 B cell subsets were isolated after CD19-MACS. 
CLL samples were sorted as CD5-^CD23-'- cells. After saturating staining of 
extracellular Ig light chain (IgL), intracellular IgL was stained with the BD 
Cytofrx/Cytoperm kit according to the manufacturer's instructions. Anti- 
bodies used were anti~Ig k chain (IgK)-PE and anti~IgK-FITC (both G20- 
193; BD). FACS data were acquired with a FACSCanto cytometer (BD). 

GEP sample preparation. RNA was extracted from 10,000 cells by the 
Centra Purescript protocol (Centra). RNA integrity was assessed by Agilent 
2100 Bioanalyzer (Agilent). Samples with RNA integrity number >9.0 were 
processed by MessageAmp 11 aRNA amplification kit and MessageAmp 11 
Biotin Enhanced kit (Ambion). For the HG U133 2.0 Plus GEP analysis data 
were generated in three batches, with the first two equally composed of 
naive, IgM-only, IgM+IgD-^CD27+, class-switched, and sMGZ B cells, and 
a third batch containing the CDS"^ B cell and CLL samples and two naive, 
IgM-'-IgD-^CD27-'-, and class-switched B cell samples each for batch correc- 
tion. Vsn-norniaHzed data were corrected for batch effect by ConiBat soft- 
ware (Johnson et al., 2007). For the HuGene-l_0-st-vl GEP analysis, 50 ng 
RNA was processed with the OVATION Pico WTA System, the WT Ova- 
tion Exon Module, and the Encore Biotin Module (NuGen). Arrays were 
scanned with a GeneChip Scanner 3000 7G (Affymetrix). GeneChip data 
have been submitted to the GEO database under accession no. GSE36907. 

IGHV gene rearrangement analysis and BCL6 PGR. For 7GHK-PCR 
analysis, cells were sorted in duplicate or triplicate aliquots. Genomic DNA 
was extracted by Centra Puregene Blood kit (QIAGEN). IGHV gene rear- 
rangements of theVnl andVH3 family were amplified in a seminested, two- 
rounded multiplex PGR assay with the Expand High FideUty PGR system 
(Roche; Kiippers, 2004). 

For the stereotyped BCR-specific PGR, DNA from ahquots of 2,500- 
5,000 cells was amplified while seminested with /GHK gene-specific leader 
exon and two sets of /GH/ primers for 30 cycles, two times. Stereotypy was 
determined by CDR3 amino acid homology of >60% to the consensus se- 
quence of one of 48 stereotype subsets (Stamatopoulos et al., 2007). The cal- 
culation of CDR3 homology with respect to physico chemical properties of 
the amino acids used was performed in a blinded fashion using the DNASIS 
MAX software. Although other algorithms exist to determine stereotypy 
(Darzentas et al., 2010), the approach used here is sufficient, as it aims to 
compare two cell types with the same measure. 

The BCL6 major mutation cluster was amplified from 5,000 cell ah- 
quots by seminested PGR (Pasqualucci et al., 1998) by the High Fidelity 
PGR System (Roche). 

All PGR products were subcloned using the TOPO TA Cloning kit 
(Invitrogen) and XLl-Blue— competent cells (Agilent). Sequences were ob- 
tained with the BigDye Deoxy sequencing kit (Applied Biosystems) and an 
automated sequencer (AB13100; Applied Biosystems) . Sequences were analyzed 



by the international IniMunoGeneTics (IMGT) information system and 
Lasergene 8 (DNASTAR) software. Sequence data have been submitted to 
GenBank database under accession nos. JX432019— JX432961. 

Fluorescence microscopy. Expression of KLF2 was determined by intra- 
cellular staining with anti-KLF2 (665333; R&D Systems) and anti— mouse-Cy2 
(Jackson I nmiunoRe search Laboratories). CDS"'- B cells were isolated by 
T cell depletion (EasySep Human B Cell Enrichment kit 19054), CD10+ 
transitional B cell depletion (anti— human CDIO MicroBeads; Miltenyi Bio- 
tec), and subsequent enrichment by rabbit anti-CD5 (SAB4503585; Sigma- 
Aldrich) and anti— rabbit IgG MicroBeads (Miltenyi Biotec). CLL cells were 
purified by CD19-MACS. Expression of KLF3 was assessed by intracellular 
stainings with anti-KLF3 (ab49221; Abeam) and anti— rabbit-Cy2 (Jackson 
I mmunoRe search Laboratories). Expression of Myc was analyzed by intra- 
cellular staining with anti— c -MYC (9E70; Invitrogen) and anti— rabbit- Cy2 
(Jackson ImmunoResearch Laboratories). All stainings were combined with 
Phalloidin-TRITC (Sigma-Aldrich) and Hoechst 33258 (Roche). Fluores- 
cence microscopy was performed on a Zeiss Axio Observer.Zl fluorescence 
microscope equipped with the respective filter sets and an Apotome. Image 
acquisition was performed via a Plan-Apochroniat 63x/l,40 oil objective 
lense (1.46 numerical aperture) and an Axio Cam MRm camera from cell 
suspensions in fluorescent mounting medium (S3023; Dako) at 23°C. Images 
were processed with Axio Vision Rel. 4.8 software (Carl Zeiss). 

Immunoblotting. Equal amounts of protein (15 |-ig) of CLL and CD19"'' 
B cells were loaded onto 6% acrylamide Tris-Glycine gels (Invitrogen) and 
transferred to PVDF membrane (Milhpore). Anti-EBFl (H00001879-M01; 
Abnova) and anti-GAPDH (sc-31915; Santa Cruz Biotechnology, Inc.) pri- 
mary antibodies were used at 1:1,000 dilution. Protein detection was per- 
formed by HRP-conjugated secondary antibodies (115— 036-062; Jackson 
ImmunoResearch Laboratories; sc-2350; Santa Cruz Biotechnology, Inc.) 
and the ECL Plus chemilu mines cence detection kit (GE Healthcare). 

Oligonucleotides. IGHV-lcader— specific primers for IGHV genes fre- 
quently used by stereotyped receptors are as follows: lGHVl-2 5'-TCT- 
TCT-TGG-TGG-CAG-CAG-CCA-CAG-GT-3'; IGHVl-69, 5'-GGA- 
CTG-GAC-CTG-GAG-GTT-CCT-CTT-TG-3'; IGHV3-11, 5'-TGC-TAT- 
AAT-AAA-AGG-TGT-CCA-GTG-TC-3'; IGHV3-21, 5'-CGA-GGA- 
TTC-ACC-ATG-GAA-CTG-GGG-CTC-C-3'; IGHV3-48, 5'-TGC-TGG- 
GTT-TTC-CTT-GTT-GCT-ATT-TTA-G-3'; IGHV4-34, 5'-CAG-GTG- 
CAG-CTA-CAG-CAG-TGG-GGC-G-3'; IGHV4-39, 5'-TGT-CTC-TGG- 
TGG-CTC-CAT-CAG-CAG-TAG-3'. 

Statistical analysis. Data were analyzed with GeneSpring GX software 
(Affymetrix) and R software for statistical computing (R Development Core 
Team [2008]; http://www.R-project.org) Probe sets with a minimum raw 
signal of 50 and at least 4 present calls in at least one condition, according to 
MAS5 software, were used for further analysis. Multivariate data analysis was 
performed with ANOVA and Tukey post-hoc testing procedures, and pair- 
wise comparisons were tested for statistical significance by Student's t test 
(P < 0.05). The Benjamini-Hochberg method was used for multiple testing 
correction. Gene set enrichment analysis was performed with the GSEA2 
software (Subramanian et al., 2005). The significance of association of stereo- 
typed receptor usage with one of the B cell subsets analyzed was deter- 
mined by two-tailed Fisher's exact tests. 

Online supplemental material. Table SI Hsts the transcripts for the hier- 
arcliical clustering in Fig. 1. Table S2 gives the numbers of differentially ex- 
pressed genes between CLL and mature B cell subsets. Table S3 lists the 
details of theV gene analysis of CD5"^ and conventional B cell subsets. Table S4 
summarizes the Bcl6 analysis of CD5"^ B cell subsets.Table S5 Hsts the transcripts 
for the hierarchical clustering in Fig. 4 a. Table S6 Usts the transcripts and 
mutation status of the samples used in the PGA in Fig. 4 c. Table S7 gives the 
stereotypic IGHV rearrangements from our study. Table S8 gives a detailed 
overview on the PGR analysis of CD5"^, CD43''", and conventional B cells 
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for stereotyped IGHV rearrangements. Table S9 lists the transcripts for the 
heatmap in Fig. 5 a. Table SIO is a list of NF-kB target genes in human 
B cells. Table Sll Hsts the transcripts for the heatmap in Fig. 5 b. Table S12 
gives the raw signal values for the heatmap in Fig. 5 d. Table S13 gives a de- 
tailed overview on the KLF3 nuclear expression pattern of CLL and CD5"^ 
B cells. Table S14 lists the genes with differential expression between CLL 
and CD5"^, but not conventional B cells. Table S15 summarizes the patient 
characteristics. Online supplemental material is available at http://wvvw.jeni 
.org/cgi/content/full/jem.20120833/DCl. 
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